OpenMP Implementation and Performance on Embedded Renesas M32R Chip Multiprocessor
نویسندگان
چکیده
CMP (Chip Multiprocessor) is a promising processor architecture, not only for high performance but also for reducing power and energy consumption in embedded applications. We have implemented an OpenMP compiler for an embedded Renesas M32R chip multiprocessor as a parallel programming environment. In this paper, we report the preliminary performance of OpenMP benchmarks, including scientific and multimedia applications on the M32R CMP. We found that OpenMP allows users to easily obtain reasonable performance improvement using multiple CPUs in the CMP with just a few directives inserted. Also, we discuss the possibility of OpenMP thread run-time scheduling and some compilation techniques for power-aware computing on the CMP.
منابع مشابه
OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers
OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an M...
متن کاملEfficient Algorithms for Fixed-Point Arithmetic Operations In An Embedded PIM
∗ Effort sponsored by Defense Advanced Research Projects Agency (DARPA) through the Air Force Research Laboratory, USAF, under agreement number F30602-99-1-0521. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not inter...
متن کاملPractical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs
State-of-the-art multiprocessor systems pose several difficulties: (i) the user has to parallelize the existing serial code; (ii) explicitly threaded programs using a thread library are not portable; (iii) writing efficient multi-threaded programs requires intimate knowledge of machine’s architecture and micro-architecture. Thus, well-tuned parallelizing compilers are in high demand to leverage...
متن کاملCoarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler
This paper describes automatic coarse grain parallel processing on a shared memory multiprocessor system using a newly developed OpenMP backend of OSCAR multigrain parallelizing compiler for from single chip multiprocessor to a high performance multiprocessor and a heterogeneous supercomputer cluster. OSCAR multigrain parallelizing compiler exploits coarse grain task parallelism and near ne gra...
متن کاملRenesas Technology to Release SH7776 (SH-Navi3), Industry’s First Dual-Core SoC with Built-in Image Recognition Processing Function for Car Information Terminals
Tokyo, January 19, 2009 — Renesas Technology Corp. today announced the SH7776 (SH-Navi3), a dual-core system-on-chip (SoC) device with on-chip enhanced graphics functions and a highperformance image recognition processing function for the next generation high-performance car information terminals that evolved from car navigation systems. The SH7776 (SH-Navi3) integrates two CPU cores on a singl...
متن کامل